Intelligent Automatic Configuration Parameter Tuning for Big Data Processing Platforms Using Machine Learning and Collaborative Filtering

Author Details

Naga Charan Nandigama

Journal Details

Published

Published: 30 December 2020 | Article Type : Research Article

Abstract

The exponential growth of big data processing has necessitated efficient parameter tuning mechanisms for distributed computing platforms like Hadoop and Apache Spark. Manual configuration optimization is timeconsuming and inefficient, while existing auto-tuning methods introduce significant overhead. This paper proposes an intelligent online parameter tuning framework that leverages Singular Value Decomposition (SVD) with collaborative filtering, deep learning neural networks, and reinforcement learning algorithms to automatically optimize Hadoop/Spark configuration parameters. Our framework incorporates a configuration repository generator using genetic algorithms and a machine learning-based recommendation engine that reduces parameter optimization time by 24-30% while maintaining performance accuracy within 14.32% of optimal configurations. Experimental evaluation on a 4-node Hadoop cluster demonstrates superior performance across diverse workloads (WordCount, Sort operations) with dataset sizes ranging from 1GB to 16GB. The proposed approach achieves 13% average memory utilization improvement and demonstrates robust adaptability to dynamic cluster conditions through online learning mechanisms.

Keywords: Big Data, Parameter Tuning, Collaborative Filtering, SVD, Machine Learning, Hadoop, Distributed Computing.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright © Author(s) retain the copyright of this article.

Statistics

29 Views

45 Downloads

Volume & Issue

Article Type

Research Article

How to Cite

Citation:

Naga Charan Nandigama. (2020-12-30). "Intelligent Automatic Configuration Parameter Tuning for Big Data Processing Platforms Using Machine Learning and Collaborative Filtering." *Volume 4*, 2, 56-64